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ABSTRACT 

Background Elucidating the genetic basis underlying 
hepatic gene expression variability is of importance to 
understand the aetiology of the disease and variation in 
drug metabolism. To date, no genome-wide expression 
quantitative trait loci (eQTLs) analysis has been 
conducted in the Han Chinese population, the largest 
ethnic group in the world. 
Methods We performed a genome-wide eQTL 
mapping in a set of Han Chinese liver tissue samples 
(n=64). The data were then compared with published 
eQTL data from a Caucasian population. We then 
performed correlations between these eQTLs with 
important pharmacogenes, and genome-wide association 
study (GWAS) identified single nucleotide polymorphisms 
(SNPs), in particular those identified in the Asian 
population. 

Results Our analyses identified 1669 significant eQTLs 
(false discovery rate (FDR) < 0.05). We found that 41% 
of Asian eQTLs were also eQTLs in Caucasians at the 
genome-wide significance level (p=10~^). Both cis- and 
fraA?s-eQTLs in the Asian population were also more 
likely to be eQTLs in Caucasians (p<10~^). Enrichment 
analyses revealed that trait-associated GWAS-SNPs were 
enriched within the eQTLs identified in our data, so were 
the GWAS-SNPs specifically identified in Asian 
populations in a separate analysis (p<0.001 for both). 
We also found that hepatic expression of very important 
pharmacogenetic (VIP) genes (n=44) and a manually 
curated list of major genes involved in pharmacokinetics 
(n=341) were both more likely to be controlled by eQTLs 
(p<0.002 for both). 

Conclusions Our study provided, for the first time, a 
comprehensive hepatic eQTL analysis in a non-European 
population, further generating valuable data for 
characterising the genetic basis of human diseases and 
pharmacogenetic traits. 



INTRODUCTION 

The liver is a vital human organ for a variety of 
physiological processes, and plays a major role in 
drug metabolism. Elucidating the basis for the 
genetic variation in hepatic gene expression w^ill 
significantly further our understanding of human 
diseases and pharmacogenomics. 

Expression quantitative trait loci (eQTLs) is one 
of the most effective ways to discover gene regula- 
tion netw^orks.^ The eQTLs method measures the 
variance in gene transcription, follow^ed by 



mapping the genetic loci affecting the expression of 
mRNA.^ To date, eQTLs mapping has been con- 
ducted in many species and in different tissues.^ 
Thus far, four genome-w^ide eQTLs studies in 
human liver tissue have been performed,^"'^ and 
numerous eQTLs have been identified. How^ever, 
no studies have been carried out in an East Asian 
population, one of the major ethnicities in the 
w^orld. Detailed eQTLs mapping in different popu- 
lations is crucial to understand the genetic hetero- 
geneity in gene regulation, and the evolutionary 
predisposition to diseases. Notably, the hepatic 
metabolising capacity of Caucasian populations has 
already been show^n to be different from that of 
Asians, as exemplified in metabolism variability of 
alcohol, testosterone, bilirubin, etc.^~^^ Hepatic 
eQTLs studies in East Asians w^ill be also crucial for 
understanding the genetic basis underlying various 
diseases and drug response variability, particularly 
in the East Asian population. Eor this reason, w^e 
have carried out a genome-w^ide eQTLs mapping in 
64 normal livers of Han Chinese. Detailed com- 
parison between the Asian and Caucasian eQTLs 
was conducted. Experimental validation in an inde- 
pendent sample set (n=54) w^as also performed. 
The relationship betw^een Asian eQTLs and 
trait-associated single nucleotide polymorphisms 
(SNPs) as w^ell as pharmacogene expression w^as 
investigated. 

MATERIALS AND METHODS 
Tissue sample collection 

Normal (non-diseased) liver tissues w^ere previously 
collected from 64 Chinese donors (all male) w^ho 
provided informed consent. The average age of the 
subjects was 34. 52 ±5. 9 8 years. The independent 
sample set of liver tissue (n=54) (non-diseased 
healthy male donors, aged 35.65 ±7.34 years) were 
new^ly collected in Shanghai Jiao Tong University 
AffiHated First People's Hospital (Shanghai, China), 
Zhangzhou Hospital Affiliated to Fujian Medical 
University (Zhangzhou, China), and Shandong 
University Affiliated Qianfoshan Hospital (Jinan, 
China). This study was approved by the ethics com- 
mittees of the medical faculties of Shanghai Jiao 
Tong University AffiHated First People's Hospital. 

RNA sample preparation, hybridisation 

Total RNA of the human liver tissue samples was 
extracted and 1.65 |xg of each RNA sample was 
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labelled and hybridised to the Agilent 44 K G4112F arrays 
(GPL4133) at Shanghai genomePilot Technology, Inc (Shanghai, 
China). 

Gene expression microarray data preprocessing 
and normalisation 

The dye-normalised and post-surrogate processed signal for the 
green channel, gProcessedSignal, obtained from Agilent's 
Feature Extraction Software was used for downstream analyses. 
The raw expression data for the 64 samples were evaluated for 
individual array quality (MA plots), array intensity distributions 
(box plots and density plots), and between-array differences 
(heat maps representing the distance between arrays) using the 
array QualityMetrics package. One outlier sample was dropped 
based on the arrayQualityMetrics default criteria. The quan- 
tile normalisation method was used to normalise the data,^^ and 
the average values were obtained for the replicate spots. The 
expression intensity values were log2 transformed. The pro- 
cesses were implemented using Limma. After data collection, we 
ran a pipeline to re-annotate the Agilent G4112F microarray for 
the current reference assembly (NCBI Build 37.3). A total of 
29 190 oHgonucleotides on the array were vaHdated for subse- 
quent analyses. 

DNA extraction, GWAS genotyping, and quality control 

Total DNA of the human liver tissue samples were extracted by 
using the RNA/DNA Mini Kit (Qiagen, Hilden, Germany). 
DNA was diluted to working concentrations of 50 ng/|iL for 
SNP chip genotyping. The genome-wide scan was performed 
using the Affymetrix Genome-Wide Human SNP Array 6.0. 
Quality control (QC) filtering of the GWAS data was performed 
by excluding arrays with a contrast QC<0.4 from further data 
analysis. The sex of each sample was determined using 
Genotyping Console, and none of them mismatched estabHshed 
and annotated sexes. Genotype data were generated using the 
birdseed algorithm. SNPs were further filtered based on anno- 
tation, call rate, Hardy-Weinberg equilibrium, and allele fre- 
quency information. As a result, among the initially genotyped 
909 622 SNPs on the Affymetrix Genome-Wide Human SNP 
Array 6.0 platform, 4023 duplicated SNPs and 1175 SNPs that 
did not have chromosomal annotation were first removed from 
further analysis. We then removed 54 656 SNPs that had a geno- 
typing call rate <95%, and 737 SNPs that deviated significantly 
from Hardy-Weinberg equilibrium at the threshold (p<10~^). 
Considering the small sample size of our dataset, we removed a 
further 54 656 SNPs that had minor allele frequency (MAF) 
<25%. In total, 302 483 SNPs remained for further analysis. 

Quantification of ancestry and sample independence test 

All the individuals are of self-reported Han Chinese ancestry. 
The multidimensional scaling (MDS) analysis was used to calcu- 
late the genome-wide IBS (identify-by-state) pairwise distances 
in PLINK VI. 07. All the samples resided in a single cluster, and 
no outlier was detected; thus all samples were used for analyses. 
Sample independence was also examined by using the IBD ana- 
lysis in PLINK. The results showed that all individuals were 
independent of each other. No sample contamination, duplica- 
tion, and significant family relationship were identified. 

eQTLs mapping 

We tested all expression traits for their associations with each of 
the QC passed SNPs using PLINK, which correlates allele 
dosage with changes in the trait. Only the SNPs within ± 2 
megabase (Mb) of the transcription start or stop of the 



corresponding gene were tested for putative c/s-eQTL. All the 
rest of the SNPs were tested for association to each expression 
trait for trans-eQTLs. To correct for the number of tests per- 
formed, a false discovery rate (FDR) of 0.05 was used as a 
cut-off for statistical significance. 

There were incomplete records for demographic and clinical 
information for these samples. In order to assess the effect of 
hidden cofactors on eQTLs mapping, we performed corrections 
for gene expression variance using PEER (probabilistic estima- 
tion of expression residuals). Interestingly, the majority of sig- 
nificant c/s-eQTLs disappeared when using the PEER corrected 
residuals for gene expression, suggesting that covariates actually 
had a minimum effect on the gene expression variance in our 
data. 

To assess the statistical power to detect eQTLs, a power 
analysis was performed using the GWAPower program. We 
found that using our 63 samples (one sample was removed after 
data cleanup), with a validation p value of 10~^ (a threshold at 
which we expect to be able to follow-up with at least in silico 
replication studies), we have about 80% power to detect an 
eQTLs accounting for 25% of the phenotypic variation in 
expression levels. 

Comparison of eQTLs between Asian and Caucasian 
populations 

To check the effect of ethnicity on eQTLs mapping, we compared 
the results with that of a Caucasian dataset (GEO accession: 
GSE26106) by Innocenti et al,^ which was conducted with the 
same gene expression microarray platform (GEO#GPL4133). The 
GSE26105 dataset has 205 Caucasian liver samples which have 
both genotype and gene expression data. The same preprocessing 
and eQTLs mapping methods used in the Asian dataset were 
applied to the raw data of the gene expression and genotype 
downloaded from GEO. We focused on 64 964 SNPs and 24 340 
gene probes which passed the QC in both datasets and have the 
genomic loci annotations. 

To assess the eQTLs replication among the two populations 
more completely, we also performed a post-hoc genome-wide 
imputation to infer genotypic information for SNPs that had 
not been genotyped in our platform using the IMPUTE2 
program^^ after prephasing the genotypes with SHAPEIT.^'^ The 
1000 genome phase I data (NCBI build 37) was used as a refer- 
ence panel and default parameters were used in prephasing and 
imputation. After imputation, we obtained imputed data for 
37 574 750 and 38 049 377 SNPs in the Han Chinese and 
Caucasian populations, respectively. The imputed data were 
further filtered based on the imputation quality (>30%) and 
MAF (>25% for Chinese and >5% for Caucasians), after 
which information for 2 624 722 and 6 763 377 SNPs remained 
for Chinese and Caucasian datasets, respectively, for further 
analysis. 

Statistical analysis for enrichment tests and population 
divergence comparison 

Trait-associated SNPs were obtained from the NIGHR catalogue 
(http://vww.genome.gov/26525384). We downloaded all catalo- 
gued SNPs associated with human traits with genome-wide signifi- 
cance (10~^) (n=6437), among which we defined the Asian 
GWAS-SNPs (n=748) as the subset of SNPs identified in East Asian 
populations, mainly Chinese and Japanese populations. The very 
important pharmacogenetic (VIP) genes (n=49) were obtained 
from the Pharmacogenomics Knowledgebase (http://www. 
PharmGKB.org). The list of major pharmacokinetic genes (n=409) 
was obtained from a previous study. 
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The enrichment significance of one dataset in another was 
calculated by the test, with all SNPs or genes from the Asian 
dataset as a background. A value of p<0.05 was considered sig- 
nificant. The binomial test was used for evaluating the enrich- 
ment of correlation with multiple gene probes in eQTLs 
hotspots. 

To assess the influence of allele frequency in eQTLs replication 
between populations, we compared the distribution of F^x (fix- 
ation index) values for the significant eQTLs between overlapped 
and non-overlapped groups, using the test. An F^x value of 0.5 
was used as a cut-off to compare the difference in the number of 
SNPs with or without overlap between the two populations. F^x 
values were obtained from the FstSNP-HapMap3 database. 

Experimental confirmation of the gene expression and SNP 
gene expression association 

The mRNA levels of seven genes (DDI; ERA?!, MRPL43, 
FADSl, BRCAl, CCND2, and PTPRE) were quantified in the 
livers using quantitative PGR (qPCR). Primers sequences are 
provided in online supplementary table S5. SNPs that were sig- 
nificantly associated with these gene expression profiles were 
also genotyped using PGR sequencing. Primer sequences are 
also listed in online supplementary table S5. The relationships 
between qPCR data and microarray data or SNP genotypes 
were determined using linear regression, with the significance 
cut-off set as 0.05. 

RESULTS 

Gene expression profiling and SNP genotyping 

We conducted a genome-wide eQTLs mapping in a set of Han 
Chinese liver tissue samples (n=63). SNPs were genotyped 
using Affymetrix SNP 6.0 chip, and genome-wide gene expres- 
sion levels were profiled using Agilent G4112F array. After QC, 
302 483SNPs remained for analysis. Microarray expression 
probes were re-annotated using a previous reported pipeline.^ A 
total of 29 190 probes were considered to be valid for subse- 
quent analyses. 

eQTLs mapping 

At the 5% FDR level (p<9.45 X 10"^), we identified a total of 
1669 eQTLs with 1322 SNPs significantly associated with the 
expression of 282 genes. Among these eQTLs, 1465 were classi- 
fied as c/s-eQTLs including 1198 SNPs for 217 genes, and 204 
trans-eQTLs, with 178 SNPs significantly associated with 68 
genes. The full list of association results at a study-wide signifi- 
cance level are provided in online supplementary table SI. 
Figure 1 shows the distribution of cis- and trans-eQTLs at the 
p<10~^ level, eQTLs hotspots, as well as the location of genes 
in the entire genome. 

We also studied the physical distance distribution (base pairs) 
of the most significant eQTLs for each gene to the gene's tran- 
scription start site (TSS). We found that the most significant 
eQTLs were enriched around TSS (figure 2), which is also con- 
sistent with previous reports.^ 

Comparison between Caucasian and Asian eQTLs 

We set out to compare the eQTLs in the Han Chinese population 
with the previously published data in an American population.^ 
Given the different platforms used in the two studies, we focused 
our analyses only on the common SNPs (n=64 964) and gene 
probes (27 340) between the two datasets. We found that at the 
general genome-wide significance level (p<10~^), 41% (113 out 
of 277) of Asian SNP-gene association pairs were also significant 
pairs in the Caucasian population (table 1). To confirm whether 



there was a significant overlap in eQTLs between the two popula- 
tions, we used a liberal p value cut-off of 10~^ for both popula- 
tions, and found that the eQTLs in the Asian population was also 
significantly more likely to be an eQTL in Caucasians 
(p<2.2xl0~^^). This enrichment remained to be significant for 
both cis- and trans-eQTLs (p<2.8xl0~^ for both) (see online 
supplementary table S2). 

To further evaluate the eQTL overlapping between the two 
populations, we performed a genome-wide post-hoc imputation 
analysis to infer genotypes for the SNPs that had not been geno- 
typed in both sample sets. After imputation, 19 703 SNPs were 
found to be significantly (p<9.45 X 10~^) associated with gene 
expression in the Han Chinese dataset, which included 18 186 
c/s-eQTLs and 1517 trans-eQTLs. After comparison with the 
Caucasian dataset, we found that 11 841 (60%) SNPs were also 
significantly associated with expression of the same genes at 
10~^ level in the Caucasian dataset, which included 11 284 
(62%) ds-eQTLs and 557 (36.7%) to^s-eQTLs. 

Given the allele frequency difference for the SNPs between 
the two populations, it is possible that population divergence 
may have affected the identification and confirmation of eQTLs 
in different populations. To test this hypothesis, we calculated 
the FsT (fixation index) value for the significant eQTLs over- 
lapped and non-overlapped between the two populations. 
Interestingly, using an Fst value of 0.5 as a cut-off, we found 
that about 4% of the overlapped eQTLs had an Fst > 0.5 com- 
pared to about 7% in non-overlapped eQTLs. Despite a small 
number, this comparison was statistically significant 
(p = 7.6xl0~^), suggesting that the SNPs with less population 
divergence were more likely to be eQTLs shared by the two 
populations (data not shown). 

Enrichment of trait-associated GWAS SNPs in Asian eQTLs 

To test the hypothesis that the GWAS-identified SNPs signifi- 
cantly associated with human traits are also more likely to be 
eQTLs,^^ we checked the enrichment of trait-associated GWAS 
SNPs in Asian eQTLs. We first focused our analysis on the 
genome-wide significant (p<10~^) SNPs associated with any 
trait in all populations deposited in the National Human 
Genome Research Institute (NHGRI) database (n=6437). 
Among 1322 SNPs significantly (FDR<0.05) associated with 
gene expression in Han Chinese livers, 17 were also 
trait-associated SNPs (see online supplementary table SI), 
which was significantly enriched compared to the SNPs that 
were not significant eQTLs (FDR>0.05) (p<1.4xl0"^) (see 
online supplementary table S3). We further divided the 
GWAS-SNPs into Asian (Chinese and Japanese population 
only) (n=748) and European (n=5689) populations. Out of 
these 17 SNPs, four SNPs were also trait-associated SNPs in 
the Asian population, which was significantly enriched in the 
Han Chinese eQTLs (p = 0.0009) (see online supplementary 
table S3). Examples here included rs 125 068 99 located in the 
a-fetoprotein gene (AFP), that was significantly associated with 
the cancer antigen 19-9 (CA19-9) in a GWAS conducted in a 
Han Chinese population. In our data this SNP was signifi- 
cantly associated with AFP gene expression (p = 1.42x 10~^^). 
More interestingly, rs3077 located in the HLA-DPAl gene — 
which exhibited a relatively low allele frequency (MAE = 0.11) 
in Caucasians but was reported to be more common in Asians 
with an allele frequency of 0.62 — was significantly associated 
with increased risk for hepatitis B virus (HBV) infection in an 
Asian population in a recent GWAS.^^ Similarly, we also 
noticed rs9277378— a significant HLA-DPBl eQTL which is in 
complete linkage disequilibrium (LD) with rs9277535 from 
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Human Liver eQTL Visualization(p<10-5) 
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Figure 1 The combined plot depicting all eQTLs for hepatic gene expression traits with p<10~^ in our study. The bottom plot shows the 
genome-wide distribution of eQTLs results, with the SNP distribution on the x axis and expression probes on the y axis. Each dot represents a 
significant SNP-expression pair. Cis-eQTLs associations are shown in a diagonal direction and trans-eQTLs are shown in a vertical direction. Darker 
colour indicates more signiificant association. The middle plot shows hotspot eQTLs enrichment in correlation with expression probes, indicating the 
eQTLs as possible key regulators affecting expression of multiple genes. The size of the red circles indicate the number of gene expression traits 
correlated with the particular SNP. The top plot is the enrichment scale (-log p value), which shows the enrichment of correlation with multiple 
genes (-log p value from binomial test after Bonferroni correction). eQTL, expression quantitative trait loci; SNP, single nucleotide polymorphism. 



HapMap Asian data — was significantly associated with HBV 
infection in Asian populations in two previously published 
GWAS.^"^ On the other hand, 15 out of the aforementioned 
19 SNPs were GWAS SNPs in European populations, and were 
also significantly enriched in the Han Chinese eQTLs 
(p< 0.0001) (see online supplementary table SI). 



Association between Asian eQTLs and expression variability 
of pharmacogenes 

The liver is the most important organ for drug metabolism. We 
hypothesised that genetic polymorphisms can explain inter- 
individual variability in pharmacogene expression, which would 
be of importance to pharmacogenetics. To confirm this, we tested 



322 



Wang X, et al. J Med Genet 201 4;51 :3 1 9-326. doi: 1 0. 1 1 36/jmedgenet-201 3-1 02045 



Quantitative traits 



o 





-. 







I \ \ \ 1 

-200 -100 0 100 200 



SNP-TSS (kb) 

Figure 2 Histogram showing distances from each gene's best 
associated SNP to its TSS. Negative and positive values denote SNPs 5' 
and 3' of TSS, respectively. The unit interval was set as 10 KB. eQTL, 
expression quantitative trait loci; SNP, single nucleotide polymorphism; 
TSS, transcription start site. 

the overlap between Asian eQTLs and the VIP genes (n=49), a hst 
of genes drawing most attention in the pharmacogenetic research 
area and recently identified by the Pharmacogenomics 
Knowledgebase (http://www.PharmGKB.org). As a result, 44 genes 
were profiled in our platform. Four genes, BRCAl, CYP2D6, 
CYP3AS, and GSTTl, were significantly (FDR<0.05) associated 
with at least one eQTLs in our dataset (see online supplementary 
table SI). Compared to the number of genes that were not signifi- 
cant eQTLs, the expression of VIP genes were significantly more 
likely to be controlled by eQTLs (p =0.002) (see online supple- 
mentary table S4). To expand this analysis to other important 
pharmacogenes, we tested the association between eQTLs and 
409 genes encoding major phase I/phase II drug metabolism 
enzymes, transporters, as well as nuclear factors regulating phar- 
macogene expression. Among these 409 genes, 341 were pro- 
filed in our study, and 17 genes (see online supplementary table 
SI) were found to be significantly associated with at least one 
eQTL. Similar to the VIP genes, this was also a significant enrich- 
ment (p=2.8xl0~^) (see online supplementary table S4). 
Notably, a number of glutathione-s-transferase genes (GST) includ- 
ing GSTA4, GSTMl, GSTM2, GSTM2P1, GSTM4, GSTM5, 
GSTTl, and GSTT2 were found to be significantly associated with 
at least one eQTLs (see online supplementary table SI). 

Experimental validation of the gene expression and SNP 
gene expression associations 

In order to confirm the findings using independent techniques, 
we quantified mRNA levels of seven randomly selected genes 
that were significantly associated eQTLs, using qPCR. Among 



Table 1 Overlap of eQTLs (p<10 ^) between Asian and 
Caucasian populations 



Dataset 


No. of significant 
SNP-gene pairs 


No. of SNPs 


No. of genes 


Asian all 


277 


215 


138 


Caucasian all 


684 


486 


435 


Overlap 


113 


100 


61 



eQTLs, expression quantitative trait loci; SNPs, single nucleotide polymorphisms. 



these seven genes, five (DDT, ERAP2, MRPL43, FADSl, and 
BRCAl) were significantly associated with c/s-SNPs, and two 
(CCND2 and PTPRE) were associated with trans-SNVs. The 
qPCR measurements were then correlated with gene expression 
profiles in the microarray. We found that qPCR measurements of 
all seven genes were significantly correlated with their microarray 
profiles (p<5xl0~^ for all). To confirm the SNP genotypes 
determined by the DNA chip, we also performed Sanger sequen- 
cing for the seven SNPs significantly associated with these genes' 
expression. We found that the genotype of one SNP (rs 1006771) 
had 100% concordance with the DNA chip data, while the 
remaining six SNPs all had 98% concordance, with discrepancy 
in the genotype for one sample for each SNP between sequencing 
and DNA chip results. Note that this discrepancy randomly 
occurred in different samples. Meanwhile, the qPCR measure- 
ments of six genes (DDT, ERAP2, MRPL43, FADSl, BRCAl, and 
PTPRE) were also significantly associated with the originally 
identified SNPs (p<0.03 for all), while no association between 
the qPCR level of CCND2 and the originally identified 
trans-SNVs was observed (p>0.18 for all) (table 2). 

To further validate the reliability of the eQTLs mapping in 
our study, we also experimentally validated the aforementioned 
seven SNP-gene pairs in an independent liver tissue set (n=54). 
Again, gene expression and SNP genotype were determined 
using qPCR and sequencing. We found that all five c/s-eQTLs 
identified in the original sample set were also significantly asso- 
ciated with gene expression profiles in the new sample set 
(p< 0.008 for all), while the two trans-eQTLs were not (p>0.4 
for both). 

DISCUSSION 

Previous studies on eQTLs mapping in human liver have demon- 
strated its power for understanding inter-individual variability in 
disease aetiology and response to therapeutic treatments.^ We 
have performed, for the first time, an eQTLs study in human livers 
of a Han Chinese population. This unique dataset may extend our 
understanding of the genetic basis underlying the variability in 
gene expression, aetiology of human diseases, and various traits in 
pharmacotherapy. 

Although we have a relatively small sample size (n=64), our 
study was able to replicate a large proportion of hepatic eQTLs 
identified in previous studies, suggesting the high quality of our 
dataset. This is also confirmed by our experimental validation. 
Without considering the SNPs in linkage disequilibrium, 41% of 
significant eQTLs in Asians were consistent with those found in 
Caucasians. However, by including imputed genotypic data, this 
number increased to 62%, further indicating that the regulatory 
mechanism underlying many SNP-gene expression correlations 
is actually common across the entire human population. By div- 
iding the eQTLs into cis (±2 Mb) and trans, at 10~^ level, 
about 30% of Asian c/s-eQTLs were also Caucasian c/s-eQTLs, 
while only 0.023% of Asian trans-eQTLs were consistent with 
Caucasian findings. This further highlighted the reliability of 
c/s-eQTLs mapping as indicated in previous studies.^ This infer- 
ence was also confirmed by our experimental validation studies 
conducted in an independent sample cohort. However, in spite 
of the small proportion of overlapping trans-eQTLs between 
the two populations, this overlap still represented a statistically 
significant enrichment, suggesting that these trans-eQTLs are 
likely to be true signals. 

eQTLs information was deemed to be important to establish 
the causal role of genetic variants and genes involved in human 
disease.^ ^ 26-28 j^^^ further confirmed the previous find- 
ings that trait-associated GWAS-SNPs are more likely eQTLs. 
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Table 2 qPCR validation of gene expression and SNP-gene correlation 

Correlation with IVIicroarray data qPCR data with Independent 



Gene 


eQTLs association 


Probe 




SNPs 


microarray data 


with SNPs 




SNPs 




sample set (n=54) 


P 


p Value 




p Value 


P 


p Value 


P 


p Value 


DDT 


cis 


A_23_ 


.PI 7769 


rsl 006771 


0.98 


1.90E-18 


0.67 


3.11E-25 


0.45 


6.97E-14 


0.58 


7.4E-03 


MRPL43 


CIS 


A_23_ 


.P382154 


rs3824783 


0.81 


1.69E-07 


-1.16 


3.59E-24 


-0.64 


2.61 E-02 


-1.21 


9.01 E-09 


ERAP2 


cis 


A_23_ 


.P30243 


rsl 0434709 


1.12 


2.91 E-31 


2.35 


8.41 E-26 


1.80 


3.02E-20 


-2.06 


1.76E-09 


FADS1 


cis 


A_24_ 


.PI 92994 


rsl 74547 


0.81 


4.36E-13 


-1.27 


7.87E-09 


-1.13 


5.83e-07 


-1.04 


5.02E-06 


BRCA1 


cis 


A_23_ 


.P207400 


rs9911630 


1.08 


6.28e-09 


-0.84 


6.61 E-11 


-0.79 


0.00135 


-0.52 


6.19E-03 


CCND2 


trans 


A_24_ 


.P270235 


rsl 404608 


0.71 


4.77E-04 


0.38 


8.97E-11 


0.12 


3.80E-01 


-0.13 


0.53 


PTPRE 


trans 


A_23_ 


.PI 38495 


rsl 463389 


2.17 


3.04E-06 


0.34 


3.03E-10 


0.50 


3.07E-02 


-0.53 


0.41 



eQTLs, expression quantitative trait loci; SNPs, single nucleotide polymorphisms; qPCR, quantitative PGR . 



More importantly, we hypothesised that the Asian eQTLs may 
be particularly useful for understanding the mechanism under- 
lying genotype-phenotype correlations in Asian populations. 
Indeed, we found that GWAS-SNPs identified in Asian popula- 
tions were significantly enriched in our eQTLs. One interesting 
example is the SNPs associated with HBV infection. HBV is 
endemic in East Asian population, and over 75% of the world's 
estimated 350 million carriers are located in Western Pacific and 
South East Asian countries. The prevalence rate of HBV infec- 
tion in Han Chinese is extremely high (up to 11%)}^ Previous 
GWAS identified two HLA-DP loci, rs3077 in HLA-DPAl and 
rs9277378 in HLA-DPBl, that were significantly associated with 
increased risk for HBV infection in Asian populations. Our 
data revealed that these two SNPs were actually significant 
eQTLs for HLA-DPAl and HLA-DPBl gene expression. More 
interestingly, the allele frequencies of rs3077 and rs9277378 are 
significantly different between Caucasian, Asian, and African 
populations, with the rare alleles among Caucasians being 
common alleles in Asians and Africans according to the 
HapMap data (MAE for rs3077 is 0.11, 0.61, and 0.76, and for 
rs9277378 is 0.29, 0.54, 0.82 among the three populations, 
respectively). This may indicate that altered hepatic gene expres- 
sion of these two genes among these populations confers differ- 
ential risks for HBV infection. This is further supported by the 
epidemiological observation that both Asian and African popula- 
tions have much higher HBV infection rates than Caucasians.^^ 

We also observed a few SNPs (rsl74547, rsl74548, and 
rsl 74549) consistently identified from GWAS in both European 
and Asian populations were associated with multiple metabolic 
perturbations and lipid metabolism traits.^^"^^ We found that 
these SNPs were significantly associated with gene expression of 
FADSl, one of the fatty acid desaturase genes. Our finding is 
further supported by studies in Caucasian populations.^^ Both 
our original and validation studies in additional samples con- 
firmed the significant association between rsl74547 and FADSl 
gene expression. This further indicates that mechanisms under- 
lying expression regulation of many genes are actually shared by 
different populations, and consequently the associated diseases 
and traits may have a common natural history. 

Besides diseases and trait-associated SNPs and genes, we also 
tested the association between Asian eQTLs and important pharma- 
cogenes. As expected, four VIP genes and 17 major genes involved 
in pharmacokinetics were significantly associated with at least one 
eQTLs. This underscored the importance of our dataset in under- 
standing the inter-individual difference in drug response and toxici- 
ties. We observed that hepatic expression levels of a group of GST 
genes were significantly affected by eQTLs. Expression of several 
important P450 genes including CYP2D6, CYP3A5, CYP3A7, and 



CYP4V2 were also found to be controlled by eQTLs. Although high 
impact polymorphisms from these genes have been identified in pre- 
vious studies,^^ our study provided new candidate polymorphisms 
that may be important to pharmacogenetics in Asian populations. 
CYP3A7, the most important P450 gene in fetal liver,^^ was signifi- 
cantly associated with multiple SNPs; further investigations are 
therefore warranted to address the question of whether these poly- 
morphisms confer susceptibility to the inter-patient differences in 
drug efficacy or toxicity in paediatric populations. 

Replication of eQTLs results between populations were often 
observed to be highly variable in different studies, which could be 
attributed to many reasons.^ We found in our study that differences 
in allele frequency can significantly affect eQTL replication in dif- 
ferent populations. An SNP with higher allele frequency in one 
population may also have greater power in association with gene 
expression compared to a relatively lower allele frequency in 
another population. In addition, as we calculated in the power ana- 
lysis, the small sample size in our study might also limit the power 
for detecting eQTLs with moderate effect, which further led to 
non-replicable eQTLs. Our sample set was also limited by the 
incomplete covariate information (demographic, clinical, etc) col- 
lected during the sample procurement process. Nevertheless, our 
future goal is to collect a larger sample set, as well as to perform 
multi-population meta-analyses to address these questions. 

In conclusion, our first eQTLs analysis in the East Asian Han 
Chinese population revealed both homogeneity and heterogen- 
eity in genetic variations in gene expression among different 
human populations. Many of our findings provided further sup- 
portive evidence for recent genomic discoveries in human dis- 
eases and pharmacogenetic traits, and more importantly 
fostered new rationales for continued investigation. Our data 
thus provide an additional valuable resource to the existing data 
in other populations. 

Author affiliations 

^Department of General Surgery, Shanghai First People's Hospital, Medical College, 
Shanghai Jiaotong University, Shanghai, China 

^Department of Pathology, Shanghai First People's Hospital, Medical College, 
Shanghai Jiaotong University, Shanghai, China 

^Department of General Surgery, Shandong University Affiliated Qianfoshan Hospital, 
Jinan, China 

^Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, 
Bio-X Institutes, Ministry of Education, Shanghai Jiao Tong University; Shanghai 
genome Pilot Institutes for Genomics and Human Health, Shanghai, China 
Department of Hepatobiliary Surgery, Zhangzhou Hospital Affiliated to Fujian 
Medical University, Zhangzhou, China 

Acknowledgements We are deeply grateful to all the participants as well as to 
the doctors working on this project. This work was supported by the National 
Natural Science Foundation of China (No. 81000188, 81270557). 



324 



Wang X, et al. J Med Genet 201 4;51 :3 1 9-326. doi: 1 0. 1 1 36/jmedgenet-201 3-1 02045 



Quantitative traits 



Contributors ZHP supervised sample recruitment. XLW, HMT, MJT and ZQL 
conducted data analyses and drafted the manuscript. JWF, LZ, XS, JMX, GQC, DWC 
and ZWW recruited samples. THX, JYZ, LH, SYW, XP and SYQ performed or 
contributed to the main experiments. All authors critically reviewed the manuscript 
and approved the final version. 

Funding The Natural Science Foundation of China. 

Competing interests None. 

Patient consent Obtained. 

Provenance and peer review Not commissioned; externally peer reviewed. 

Additional data file The full eQTLs dataset is accessible in an online data 
resource (http://analysis2.bio-x.cn/SHEsisMain.htm). The data were also deposited in 
the NCBI GEO database (accession numbers: GSE53792). The following additional 
data are available. Supplemental file 1 contains table S1, which provides a full list 
for association results at a study-wise significance level (FDR <0.05) for gene 
expression and genotyping data. Supplemental tables S2-S5 include enrichment 
analyses and primer sequences. 

Open Access This is an Open Access article distributed in accordance with the 
Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which 
permits others to distribute, remix, adapt, build upon this work non-commercially, 
and license their derivative works on different terms, provided the original work is 
properly cited and the use is non-commercial. See: http://creativecommons.org/ 
licenses/by-nc/3.0/ 



REFERENCES 

1 Brem RB, Kruglyak L. The landscape of genetic complexity across 5,700 gene 
expression traits in yeast. Proc Natl Acad Sci USA 2005;102:1572-7. 

2 Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends 
Gene 2001;17:388-91. 

3 Majewski J, Pastinen T. The study of eQTL variations by RNA-seq: from SNPs to 
phenotypes. Trends Genet 201 1;27:72-9. 

4 Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Zhang B, 
Wang S, Suver C. Mapping the genetic architecture of gene expression in human 
liver. PLoS Biol 2008;6. http://www.plosbiology.org/article/info%3Adoi%2F10. 
1371%2Fjournal.pbio.0060107 (accessed Jun 2013). 

5 Innocenti F, Cooper GM, Stanaway IB, Gamazon ER, Smith JD, Mirkov S, Ramirez J, 
Liu W, Lin YS, Moloney C. Identification, replication, and functional fine-mapping of 
expression quantitative trait loci in primary human liver tissue. PLoS Genet 201 1;7: 
el 002078. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal. 
pgen. 1002078 (accessed Jun 2013). 

6 Schroder A, Klein K, Winter S, Schwab M, Bonin M, Zell A, Zanger UM. Genomics 
of ADME gene expression: mapping expression quantitative trait loci relevant for 
absorption, distribution, metabolism and excretion of drugs in human liver. 
Pharmacogenomics 7 20 1 1 ; 1 3 : 1 2-20. 

7 Greenawalt DM, Dobrin R, Chudin E, Hatoum IJ, Suver C, Beaulaurier J, Zhang B, 
Castro V, Zhu J, Sieberts SK, Wang S, Molony C, Heymsfield SB, Kemp DM, Reitman ML, 
Lum PY, Schadt EE, Kaplan LM. A survey of the genetics of stomach, liver, and adipose 
gene expression from a morbidly obese cohort. Genome Res 201 1;21:1008-16. 

8 Brooks PJ, Enoch MA, Goldman D, Li TK, Yokoyama A. The alcohol flushing 
response: an unrecognized risk factor for esophageal cancer from alcohol 
consumption. PLoS Med 2009;6:e50. http://www.plosmedidne.org/article/info% 
3Adoi%2F10.1371%2Fjoumal.pmed. 1000050 (accessed June 2013). 

9 Jakobsson J, Ekstrom L, Inotsume N, Garle M, Lorentzon M, Ohisson C, Roh HK, 
Carlstrom K, Rane A. Large differences in testosterone excretion in Korean and 
Swedish men are strongly associated with a UDP-glucuronosyl transferase 2B17 
polymorphism. J Clin Endocrinol Metab 2006;91:687-93. 

10 Lampe JW, Bigler J, Horner NK, Potter JD. UDP-glucuronosyltransferase 
(UGT1A1*28 and UGT1A6*2) polymorphisms in Caucasians and Asians: 
relationships to serum bilirubin concentrations. Pharmacogenetics 1999;9:341-9. 

1 1 Kauffmann A, Gentleman R, Huber W. ArrayQualityMetrics — a bioconductor 
package for quality assessment of microarray data. Bioinformatics 2009;25:415-16. 

12 Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization 
methods for high density oligonucleotide array data based on variance and bias. 
Bioinformatics 2003; 1 9: 1 85-93. 

13 Kom JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, 
Veitch J, Collins PJ, Darvishi K. Integrated genotype calling and association analysis 
of SNPs, common copy number polymorphisms and rare CNVs. Nat genet 
2008;40:1253-60. 

14 Stegle 0, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of 
expression residuals (PEER) to obtain increased power and interpretability of gene 
expression analyses. Nat Protoc 2012;7:500-7. 

1 5 Feng S, Wang S, Chen CC, Lan L. GWAPower: a statistical power calculation software 
for genome-wide association studies with quantitative traits. BMC Genet 201 1;12:12. 



16 Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation 
method for the next generation of genome-wide association studies. PLoS Genet 
2009;5:e1 000529. 

17 Delaneau 0, Zagury JF, Marchini J. Improved whole chromosome phasing for 
disease and population genetic studies. Nat Methods 2013;10:5-6. 

18 Wei R, Yang F, Urban TJ, Li L, Chalasani N, Flockhart DA, Liu W. Impact of the 
interaction between 3'-UTR SNPs and microRNA on the expression of human 
xenobiotic metabolism enzyme and transporter genes. Front Genet 2012;3:248. 

19 Duan S, Zhang W, Cox NJ, Dolan ME. FstSNP-HapMap3: a database of SNPs with 
high population differentiation for HapMap3. Bioinformation 2008;3:139-41. 

20 Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M, 
Pritchard JK. High-resolution mapping of expression-QTLs yields insight into human 
gene regulation. PLoS Genet 2008;4:e1 000214. http://www.plosgenetics.org/article/ 
info%3Adoi%2F10.1371%2Fjournal.pgen.1000214 (accessed June 2013). 

21 Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated 
SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. 
PLoS Genet 201 0;6:e1 000888. http://www.plosgenetics.org/article/info%3Adoi% 
2F10.1371%2Fjournal. pgen. 1000888 (accessed June 2013). 

22 He M, Wu C, Xu J, Guo H, Yang H, Zhang X, Sun J, Yu D, Zhou L, Peng T, He Y, 
Gao Y, Yuan J, Deng Q, Dai X, Tan A, Feng Y, Zhang H, Min X, Yang X, Zhu J, 
Zhai K, Chang J, Qin X, Tan W, Hu Y, Lang M, Tao S, Li Y, Li Y, Feng J, Li D, 
Kim ST, Zhang S, Zhang H, Zheng SL, Gui L, Wang Y, Wei S, Wang F, Fang W, 
Liang Y, Zhai Y, Chen W, Miao X, Zhou G, Hu FB, Lin D, Mo Z, Wu T A genome 
wide association study of genetic loci that influence tumour biomarkers cancer 
antigen 19-9, carcinoembryonic antigen and a fetoprotein and their associations 
with cancer risk. Gut 2014;63:143-51. Published online first:7 January 2013. 

23 Nishida N, Sawai H, Matsuura K, Sugiyama M, Ahn SH, Park JY, Hige S, Kang JH, 
Suzuki K, Kurosaki M, Asahina Y, Mochida S, Watanabe M, Tanaka E, Honda M, 
Kaneko S, Orito E, Itoh Y, Mita E, Tamori A, Murawaki Y, Hiasa Y, Sakaida I, 
Korenaga M, Hino K, Ide T, Kawashima M, Mawatari Y, Sageshima M, 
Ogasawara Y, Koike A, Izumi N, Han KH, Tanaka Y, Tokunaga K, Mizokami M. 
Genome-wide association study confirming association of HLA-DP with protection 
against chronic hepatitis B and viral clearance in Japanese and Korean. PLoS One 
2012;7:e39175. http://www.plosone.org/article/info%3Adoi%2F1 0.1 371% 
2Fjoumal.pone.0039175 (accessed Jun 2013) 

24 Kamatani Y, Wattanapokayakit S, Ochi H, Kawaguchi T, Takahashi A, Hosono N, 
Kubo M, Tsunoda T, Kamatani N, Kumada H. A genome-wide association study 
identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians. 
Nat Genet 2009;41:591-5. 

25 Mbarek H, Ochi H, Urabe Y, Kumar V, Kubo M, Hosono N, Takahashi A, Kamatani Y, 
Miki D, Abe H. A genome-wide association study of chronic hepatitis B identified novel 
risk locus in a Japanese population. Hum Mol Genet 201 1 ;20:3884-92. 

26 Mirkov S, Myers JL, Ramirez J, Liu W. SNPs affecting serum metabolomic traits may regulate 
gene transcription and lipid accumulation in the liver. Metabolism 2012;61:1523-7. 

27 Speliotes EK, Yerges-Armstrong LM, Wu J, Hernaez R, Kim LJ, Palmer CD, 
Gudnason V, Eiriksdottir G, Garcia ME, Launer LJ, Nails MA, Clark JM, Mitchell BD, 
Shuldiner AR, Butler JL, Tomas M, Hoffmann U, Hwang SJ, Massaro JM, 
O'Donnell CJ, Sahani DV, Salomaa V, Schadt EE, Schwartz SM, Siscovick DS, 
NASH CRN, GIANT Consortium, MAGIC InvestigatorsVoight BF, Carr JJ, Feitosa MF, 
Harris TB, Fox CS, Smith AV, Kao WH, Hirschhom JN, Borecki IB, GOLD Consortium. 
Genome-wide association analysis identifies variants associated with nonalcoholic 
fatty liver disease that have distinct effects on metabolic traits. PLoS Genet 201 1;7. 
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen. 
1001324 (accessed June 2013). 

28 Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, Holm H, Sanna S, 
Kavousi M, Baumeister SE, Coin LJ, Deng G, Gieger C, Heard-Costa NL, Hottenga JJ, 
Kuhnel B, Kumar V, Lagou V, Liang L, Luan J, Vidal PM, Mateo Leach I, O'Reilly PF, 
Peden JF, Rahmioglu N, Soininen P, Speliotes EK, Yuan X, Thorleifsson G, Alizadeh BZ, 
Atwood LD, Borecki IB, Brown MJ, Charoen P, Cucca F, Das D, de Geus EJ, Dixon AL, 
Doring A, Ehret G, Eyjolfsson Gl, Farrall M, Forouhi NG, Friedrich N, Goessling W, 
Gudbjartsson DF, Harris TB, Hartikainen AL, Heath S, Hirschfield GM, Hofman A, 
Homuth G, Hypponen E, Janssen HL, Johnson J, Kangas AJ, Kema IP, Kuhn JP, Lai S, 
Lathrop M, Lerch MM, Li Y, Liang TJ, Lin JP, Loos RJ, Martin NG, Moffatt MF, 
Montgomery GW, Munroe PB, Musunuru K, Nakamura Y, O'Donnell CJ, Olafsson I, 
Penninx BW, Pouta A, Prins BP, Prokopenko I, Puis R, Ruokonen A, Savolainen MJ, 
Schlessinger D, Schouten JN, Seedorf U, Sen-Chowdhry S, Siminovitch KA, Smit JH, 
Spector TD, Tan W, Teslovich TM, Tukiainen T, Uitterlinden AG, Van der Klauw MM, 
Vasan RS, Wallace C, Wallaschofski H, Wichmann HE, Willemsen G, Wurtz P, Xu C, 
Yerges-Armstrong LM, Abecasis GR, Ahmadi KR, Boomsma Dl, Caulfield M, 
Cookson WO, van Duijn CM, Froguel P, Matsuda K, McCarthy Ml, Meisinger C, 
Mooser V, Pietilainen KH, Schumann G, Snieder H, Sternberg MJ, Stolk RP, Thomas HC, 
Thorsteinsdottir U, Uda M, Waeber G, Wareham NJ, Waterworth DM, Watkins H, 
Whitfield JB, Witteman JC, Wolffenbuttel BH, Fox CS, Ala-Korpela M, Stefansson K, 
Vollenweider P, Volzke H, Schadt EE, Scott J, Jarvelin MR, Elliott P, Kooner JS. 
Genome-wide association study identifies loci influencing concentrations of liver 
enzymes in plasma. Nat Genet 201 1;43:1 131-8. 

29 Gust ID. Epidemiology of hepatitis B infection in the Western Pacific and South East 
Asia. Gut 1996;38(Suppl 2):18-23. 



Wang X, et al. J Med Genet 2014;51:319-326. doi:10.1 136/jmedgenet-2013-102045 



325 



Quantitative traits 



30 Han Y, Pei Y, Liu Y, Zhang L, Wu S, Tian Q, Chen X, Shen H, Zhu X, Papasian CJ, 
Deng H. Bivariate genome-wide association study suggests fatty acid desaturase 
genes and cadherin DCHS2 for variation of both compressive strength index and 
appendicular lean mass in males. Bone 2012;51:1000-7. 

31 Hong KW, Jin HS, Song D, Kwak HK, Soo Kim S, Kim Y. Genome-wide association study 
of serum albumin:globulin ratio in Korean populations. 7 /^um Genet 2013; 
58:174-7. 



32 Hong MG, Karlsson R, Magnusson PK, Lewis MR, Isaacs W, Zheng LS, Xu J, Gronberg H, 
Ingelsson E, Pawitan Y, Broeckling C, Prenni JE, Wiklund F, Prince JA. A genome-wide 
assessment of variability in human serum metabolism. Hum Mutat 2013;34:51 5-24. 

33 Hu L, Zhuo W, He YJ, Zhou HH, Fan L. Pharmacogenetics of P450 oxidoreductase: 
implications in drug metabolism and therapy. Pharmacogenet Genom 2012;22:812-19. 

34 Hines RN. Ontogeny of human hepatic cytochromes P450. J Biochem Mol Toxicol 
2007;21:169-75. 



326 



Wang X, et al. J Med Genet 201 4;51 :3 1 9-326. doi: 1 0. 1 1 36/jmedgenet-201 3-1 02045 



